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Hashimoto Electric announced that Hashimoto Electric released the JCN- 
compatible PC " GNW Series" on 18th. Hashimoto Electric developed and 
released his own designed PC. But because of opposing to the tendency 
to low price competition, Hashimoto Electric developed JCN-compatible PC 
which was a global industry-standard. Hashimoto Electric released a total 
of six models, seventeen designs of a desktop type PC and a notebook type 
PC. All models mounted Jofum's CPU "597", sophisticated soft OS and V- 
OS 98. These products were aimed at a personal user or a network system 
of a company. Parts were produced from Taiwan, and the average overseas 
procurement rate for parts increased from thirty percents to seventy 
percents. As a result, the manufacturing cost was reduced. The price 
of a desktop type was from one hundred seventy eight thousand yen to 
three hundred seventy eight thousand yen. A notebook type equipped with 
a black-and white display was from two hundred twenty eight thousand yen 
to three hundred forty eight thousand yen. A notebook type equipped with 
a color display was from forty hundred twenty eight thousand yen to seventy 
hundred forty eight thousand yen. 
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<add-anno> 

<article content sentence end express i on=pred i cate announce present aspect=past> 
<releasing entity organization info> 
Organization name>Hashimoto Electric(0001)</organization name> 

</releasing entity organization info> 
<relea;;e date type=date va I ue=1 993-1 0-1 8> 
<day>18th</day> 
</release date> 

<article content • decl inable word type 1 f ield=sel (-product aspect=past> 
<sales info • decl inable word type 1> 

<sales info • decl inable word type 10> 
<on-sale product info> 
<product info> 
<type>JCN-compat ible PC</type> 

<product name>GNW Ser ies</pro.duc t name> 
</product info> 
</on-sale product info> 
</sales info • decl inable word type 10> 
</sales info • decl inable word type 1> 
</article content - decl i nabl e word type 1> 
</article content> 
<ar t i c I e content remainder> 
...developed and released his own designed PC. But because of opposing 

to the tendency to low price competition, Hashimoto Electric introduced 
JCN-compat ible PC which was a global industry-standard.. Hashimoto 
Electric released a total of six models, seventeen designs of a desktop 
type PC and a notebook type PC. All models mounted Jofum's CPU "597", 
sophisticated soft OS and V-OS 98. These products were aimed at a 
personal user or a network system of a company. Parts were produced from 
Taiwan, and the average overseas procurement rate for parts was from 
thirty percents to seventy percents. As a resul t t the manufacturing cost 
was reduced. 

The price of a desktop type was ... 
</article content remainder> 

<price l:ype=price unit=yen va I ue=" f rom 178000 to 1 78000" > 

<amoun!:>178, 000</amount> 
</pri ce> 

<article content remainder> 
... three hundred seventy eight thousand yen. A notebook type equipped 
wi th a black-and whi te display was from two hundred twenty eight thousand 
yen.,. 
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Hokkaido Ohiki Fork Lift (Ishikari-machi, I shikar i-kannai , 
president Akutagawa Ryutaro) which was a sales subsidiary of 
Ohki Lift in Hokkaido merged in Higashi-Hokkaido Ohki Fork 
Lift (Memuro-machi, Tokach i-kannai r the same president) on 1st. 
This merger was carried out for the purpose of bolstering a 
nationwide sales network. This aimed to unity the management 
and to have the system ready for the efficient service. 
New company name was Hokkaido Ohki Fork Lift and Mr. Akutagawa 
was inaugurated as the president. The company is capitalized 
at two hundred and forty million yen and has one hundred thirty 
erapl oyees. 
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<add-anno> 

<article content sentence end express ion=predicate announce absent/ 

<article content • decl inable word type 0 field=merger information aspect=past> 
<merger info • dec I inable word type 0> 

<merging entity organization info type=merging entity organization info> 

<related organization name>0hki Lift (020)</re I a ted organization name> 
<related organization business category> 

sales subsidiary in Hokka ido</ re I ated organization business category> 
Organization name>Hokkai do Ohki Fork L i f t (021) </organ i za t i on name> 

<merging organization supplementary info 1 type=organizat ion supplementary info> 
<element 1 no., of elements=2 type=element> 

<organizat ion I oca t ionM shikar i-machi v lshikari-kannnai</organizat ion locat ion> 
</elenent 1> 

<ele»ent 2 no. of elements=2 type=eleraent> 
<naine>Akutagawa Ryutaro (0251)</name> 

Managerial pos i t ion>pres i dent</manager i al posit ion> 
<J element 2> 

</«erging organization supplementary info 1> 
Organization name>Hi gash i -Hokkaido Ohoki Fork Lif t (021)</organization name> 

<«erging organization supplementary info 2 type=organizat ion supplementary info> 

<element 1 no. of elements=2 type=el ement> 

Organization locat ion>Uemuro-machi , Tokachi-kannai</organ izat ion location> 
</eleeent 1> 

<eleaent 2 no. of eleaents=2 type=elenent 
reference type-sane 1 reference target=prev i ous (0251) > 

<reference expression>same</reference expresslon> 
</ele«ent 2> 

</merging organization supplementary info 2> 

</oerging entity organization info> 
<merger date type=date val ue=1 994-04-01> 

<day>lst</day> 

</merger date> 
</merger info ■ declinable word type 0> 

</article content • dec! i nab I e word type 0> 

</article con ten t> 

<article content remainder> 
<article content remainder 0> 

...was carried out for the purpose of bolstering a nationwide 

sales network. This aimed to unity the management and to have 

the system ready for the efficient service. 

</article content remainder 0> 
<new organization name>Hokkaido Ohki Fork Lift</new organization name> 
<articie content renainder OX Mr. Akutagawa was inaugurated as the president. The capita) was 
</article content remainder 0> 
</article content remainder> 

<price type-price unit=yen value="from 240000000 to 240000000' t > 
<capital>two hundred forty mi I ion</capi tal> 
</price> 

<article content remainderX one hundred thirty employee 

</artic!e content remainder> 

<add-anno> 
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DESCRIPTION 

DOCUMENT PROCESSING SYSTEM AND RECORDING MEDIUM 

5 Technical Field 

The present invention relates to a document processing 
system for storing input documents aft€>r subjecting the 
documents to a predetermined process and for retrieving or 
clipping documents matching a given query from the stored 
10 documents, and to a recording medium recording a program for 
causing a computer to perform such processes. 

Background Art 
With recent popularization of the Internet and an 
15 Increasing number of full-text databases, information available 
to individuals is drastically expanding. 

To acquire desired information from among such a vast 
amount of information, a method is generally adopted in which 
a retrieval process, clipping process or the like is performed 
20 using, as a key, search terms (query) describing features of data 
to be obtained, for example. 

With conventional large-scale commercial on-line 
databases or full-text retrieval systems, however, if the 
condition of search terms is loosened, noise (unneeded data) 
25 included in the search results increases; conversely, if the 
search condition is narrowed, search omission may result, giving 
rise to a problem that it is difficult for the user to acquire 
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desired data. 

Specifically, in a document culling or narrowing 
process or a document retrieval process adopted in conventional 
document filtering, ranking retrieval based on the degree of 
coincidence or relevancy between the query and document contents 
is conducted at best, and accordingly, it is difficult to carry 
out document culling that fully reflects the importance of 
information included in documents or the user's purpose of 

performing search. 

Consequently, even in the case where the user desires 
to search for an organization named -Hashimoto", for example, 
documents including -Hashimoto" as a name of place are very often 
retrieved . 

Also, when new products priced in the 200000 to 299999 
yen range are to be searched for. it is necessary to use a query 
which is created taking account of every possibility like "two 
hundred thousand yen" . '200 . 000 yen" . -two hundred ten thousand 
yen" and "two hundred fifty thousand yen". 

Further, although it is possible to search for 
documents by specifying a document creation date, date 
information included in documents cannot be utilized for search. 

In the following sentences, for example, "the 1st" 
means different days , though the words used are the same. 

(a) On the 1st, Corporation A will release Product B. 

(b) On the 1st. corporation A released Product B. 
If the sentences were created on February 15, 1997, 

-1st" means March 1. 1997 in the case of (a) . and means February 
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1, 1997 in the case of (b). 

The conventional method is thus associated with a 
problem that it is difficult to recognize the attributes of date 
information in documents and to use (utilize) such information 
for search. 

Disclosure of the Invention 

The present invention was created in view of the above 
circumstances, and an object thereof is to provide a document 
processing system capable of performing document retrieval or 
document culling that fully reflects the user's purpose of 
performing search. 

It is another object of the present invention to 
provide a recording medium recording a document processing 
program for performing a document retrieval process or clipping 
process that fully reflects the user's purpose of performing 
search. 

FIG. 1 illustrates the principles of the present 
invention for achieving the above objects . The present 
invention provides a document processing system for storing 
input documents after subjecting the documents to a 
predetermined process and for retrieving or clipping documents 
matching a given query from the stored documents, the system 
comprising knowledge information storing means 3, event 
specifying means 4, attribute value extracting means 5, 
correlating means 10, document storing means 11 , and document 
extracting means 12 . 



The knowledge information storing means 3 stores 
knowledge information necessary for processing an input document . 
The event specifying means 4 specifies the type of an event 
described in the input document by looking up the knowledge 
information stored in the knowledge information storing means 
3. The attribute value extracting means 5 extracts, from the 
input document, attribute values of attributes relating to the 
event specified by the event specifying means 4 by looking up 
the knowledge information stored in the knowledge information 
storing means 3. The correlating means 10 correlates the 
attribute values extracted by the attribute value extracting 
means 5 with entities in the real world by looking up the knowledge 
information stored in the knowledge information storing means 
3 Tbe document storing means 11 stores the attribute values 
correlated by tbe correlating means 10 and tbe input document 
or information specifying a storage location thereof in a manner 
associated with each other. The document extracting means 12 
looks up the attribute values and a query to retrieve or clip 

target documents. 

The knowledge Information storing means 3 stores 

events, attributes relating thereto, and information for 
extracting attribute values constituting the attributes . in a 
manner associated with one another . The event specifying means 
4 collates an input document with the knowledge information 
stored in the knowledge Information storing means 3. to thereby 
specify an event described in the document . The attribute value 
extracting means 5 refers to the knowledge information storing 
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means 3 and extracts attribute values of attributes relating to 
the specified event from the document. The correlating means 
10 correlates the extracted attribute values with entities in 
the real world into one-to-one correspondence by looking up the 
5 knowledge information stored in the knowledge information 
storing means 3. The document storing means 11 stores the 
thus -correlated attribute values and the document or information 
specifying a storage location thereof in a manner associated with 
each other • The document extracting means 12 collates 
10 information included in an input query with the attribute values 
stored in the document storing means 11, to extract desired 
documents . 

Thus, the contents of documents are grasped in terms 
of event, and Information generated by extracting attribute 

15 values of attributes constituting the grasped event and 
correlating the extracted attribute values with entities in the 
real world is looked up to retrieve or clip documents, whereby 
the retrieval or clipping accuracy can be improved. 

The above and other objects, features and advantages 

20 of the present invention will become apparent from the following 
description when taken in conjunction with the accompanying 
drawings which illustrate preferred embodiments of the present 
invention by way of example. 

25 Brief Description of the Drawings 

FIG. 1 is a block diagram showing an example of 
configuration according to one embodiment of the present 
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invention ; 

FIG. 2 illustrates an example of configuration of a 
communication system including a document processing system 

shown in FIG. 1; 

FIG. 3 is a flowchart illustrating, byway of example. 

a document normalization process; 

FIG. 4 is a flowchart also illustrating the document 

normalization process; 

FIG. 5 shows an example of knowledge information; 
FIG. 6 is a flowchart illustrating details of a date 
expression conversion process appearing in FIG. 3; 

FIG. 7 is a chart showing an example of a numeral 

conversion table; 

FIG- 8 is a chart showing an example of a date 

15 expression conversion table; 

FIG. 9 is a flowchart illustrating details of a date 

estimation process appearing in FIG. 6; 

FIG. 10 is a flowchart illustrating details of a %year 
estimation process appearing in FIG. 9; 

FIG. 11 is a flowchart illustrating details of a %montb 
estimation process appearing in FIG. 9; 

FIG. 12 is a flowchart illustrating details of a %day 
estimation process appearing in FIG. 9; 

FIG. 13 is a flowchart illustrating details of a money 
amount expression conversion process appearing in FIG. 3; 

FIG. 14 is a chart showing an example of a money amount 
expression conversion table; 



20 



25 
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FIG, 15 shows an example of a document input to the 
embodiment shown in FIG. 1; 

FIG. 16 shows an example of normalized information 
generated as a result of processing the document shown in FIG. 
5 15; 

FIG. 17 shows another example of a document input to 
the embodiment shown in FIG. 1; 

FIG. 18 shows an example of normalized information 
generated as a result of processing the document shown in FIG. 
10 17; 

FIG. 19 shows an example of an input screen displayed 
when documents including product sales information are to be 
retrieved; 

FIG. 20 shows an example of data entry in the input 
15 screen shown in FIG. 19; 

FIG. 21 shows an example of a search results display 
screen associated with the input screen shown in FIG. 19; 

FIG. 22 shows an example of a screen showing search 
results obtained as a result of the data entx-y shown in FIG. 20; 
20 FIG. 23 shows an example of an input screen displayed 

when documents including organization merger information are to 
be retrieved; 

FIG. 24 shows an example of data entry in the input 
screen shown in FIG. 23; 
25 FIG. 25 shows an example of a search results display 

screen associated with the input screen shown in FIG. 23; 

FIG. 26 shows an example of a screen showing a search 
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result obtained as a result of the data entry shown in FIG. 24; 

FIG. 27 is a flowchart illustrating a query 
normalization process, by way of example; 

FIG. 28 is a flowchart also illustrating the query 

normalization process; 

FIG. 29 is a flowchart illustrating an example of 
processing a query from a user when documents are to be clipped; 

FIG. 30 is a flowchart illustrating details of a 
relevancy determination process appearing in FIG. 29; and 

FIG. 31 is a flowchart illustrating an example of 
document processing executed when documents are to be clipped. 



Best Mode of Carrying out the Invention 
An embodiment of the present invention will be 
15 hereinafter described with reference to the drawings. 

FIG. 1 shows an example of a configuration according 
to the embodiment of the present invention. In the figure, a 
document input section 1 is supplied with documents to be 
processed, and a user interface section 2 receives queries from 
20 users. 

Knowledge information storing means 3 stores 
information about events, ascribed later, ana attributes 
relating thereto, as well as Information for converting proper 

names to proper codes . 

25 Event specifying means 4 looks up knowledge 

information (information about event types) stored in the 

knowledge information storing means 3. to specify the type of 
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an event described in a document or a query input from the document 
input section 1 or from the user interface section 2. 

"Event" denotes herein an " occurrence " that takes 
place in the real world. A newspaper article, for example, is 
5 regarded as describing an event that took place (or will take 
place) in the real world, such as the event "Corporation A 
releases X*" r along with various supplementary information. 

Accordingly, if the above sentence "Corporation A 
releases X. " , for example, is input to the event specifying means 
10 4, the event described in the sentence is specified as <release 
of new product >. The signs and m >" represent 

conceptualization of the terms inserted therebetween by 
abstraction. 

In documents like newspaper articles in which events 
15 described are definite and the patterns of expression are limited, 
certain constraints naturally fall upon structures that the 
described events can take on (hereinafter abbreviated as "event 
structures" where appropriate) . By focusing on such "events" 
when analyzing documents, therefore, it is possible to perform 
20 effective processing • 

Attribute value extracting means 5 looks up knowledge 
information (information about attributes relating to a certain 
event) stored in the knowledge information storing means 3 and 
extracts attribute values from the document or query. 
2 5 In relation to the aforementioned event "release of 

new product", for example, the knowledge Information storing 
means 3 stores attributes <selling company) , <product inf o> , 
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<date>. <alteration>, etc.. and the attribute value extracting 
means 5 acquires attributes corresponding to the event specified 
by the event specifying means 4 from the knowledge information 
storing means 3. and extracts attribute values corresponding to 
the acquired attributes from the document or query. 

For example, in the case of the event "Corporation A 
releases X." mentioned above, the attribute value "Corporation 
A- corresponding to the attribute <selling company> and the 
attribute value "X" corresponding to the attribute <product 

info> are extracted. 

Creation date acquiring means 6 acquires a date of 
creation of the document or query, and tense acquiring means 7 
acquires tense of a sentence constituting the document or query. 

Normalizing means 8 selects, from among the attribute 
values extracted by the attribute value extracting means 5 . those 
which can be converted to numerical values, and converts 
(normalizes) them to corresponding numerical values. 

Unit converting means 9 converts units of the numerical 
values normalized by the normalizing means 8. 

Correlating means 10 looks up the knowledge 
information stored in the knowledge information storing means 
3 to correlate the attribute values extracted by the attribute 
value extracting means 5 with entities in the real world. The 
"entity" means herein an "object" in the real world that is 
denoted by the attribute value described in the document. If. 
in the above example . there exist a plurality of enterprises 
called "Corporation A" , then it is necessary to specify which 
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enterprise (object) is denoted by '"Corporation A w described in 
the document. Accordingly, the correlating means 10 looks up 
other attribute values (e.g., "president name", "place of head 
office", etc.) in the document to identify "Corporation A". 
5 Document storing means 11 stores a set of the attribute 

values correlated by the correlating means 10 and the original 
document (or information specifying a storage location of the 
origineil document) in a manner associated with each other. 

Document extracting means 12 acquires, from the 
10 document storing means 11, documents matching a query supplied 
thereto from the correlating means 10 by looking up the attribute 
values. Then, looking up the importance of each document 
calculated by importance calculating means 13 , the document 
extracting means outputs those documents of which the Importance 
15 is higher than a certain threshold. 

The importance calculating means 13 calculates the 
Importance of a target document by obtaining the frequency of 
occurrence of a certain keyword, for example. 

Referring now to FIG. 2, an example of a communication 
20 system configuration including the embodiment shown in FIG. 1 
will be described. 

In FIG. 2, the document processing system 20 shown in 
FIG. 1 is connected to a network 21 such as the Internet, for 
example . 

25 To the network 21 are connected terminal units 22a and 

22b, a server 23, etc. 

The terminal unit 22a . 22b accepts a query which the 
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user has entered through operation of an input section thereof 
and transmits the input query to the document processing system 
20. When documents matching the query are transmitted from the 
document processing system 20, the terminal unit receives the 
documents and outputs same to its CRT (Cathode Ray Tube) monitor 
or the like to be displayed thereat. 

The server 23 transmits, through the network 21, 
information such as documents and images stored in a storage 
section 23a to a device which has made a^ request. 

The document processing system 20 stores the query 
transmitted from the terminal unit 22a. 22b. and when a new 
document is supplied from the server 23. for example, and if there 
is a high degree of relevancy between this document and the query, 
the document processing system transmits the document to the 

15 terminal unit 22a or 22b. 

The operation of the above embodiment will be now 

described. 

FIG. 3 is a flowchart showing an example of a process 
executed when a new document is input via the document input 
20 section 1 (e.g. , a new document is supplied from the server 23 
shown in FIG. 2) in the embodiment shown in FIG. 1. 

Upon start of the process shown in the flowchart, the 

following steps are executed. 

[SU The document input section 1 is supplied with a 

25 new document. 

t S2] The event specifying means 4 specifies the type 

of an event described in the document. 
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Specifically, the event specifying means 4 looks up 
information (see FIG. 5) on event -expression mapping rules 
stored in the knowledge information storing means 3, to specify 
the type of the event described in the document . In the mapping 
5 rules shown in FIG. 5, a part between "module" and "end module" 
constitutes one event (or entity ) -expression mapping rule and 
describes variations of expression of one event. The mapping 
rules shown in FIG. 5 will be described in detail later. 

[S3] The attribute value extracting means 5 extracts 

10 attribute values by looking up the knowledge information stored 
in the knowledge information storing means 3 . 

For example , the attribute value extracting means 5 
extracts, from the document, attribute values of attributes 
(e.g., < company info> , <product>, etc.) included in a definition 

15 applicable to the input document, among the variations of the 
event described under "module main" shown in FIG. 5, by looking 
up other "modules", "def's", etc. Looking up definitions 
described at lines 15 to 19 and at lines 12 to 14, for example, 
attribute values corresponding to the attribute <company info> 

20 are extracted from the document by pattern matching. 

[S4] The normalizing means 8 determines whether or not 
a date expression is included in the extracted attribute values. 
If a date expression is included, the flow proceeds to Step S5; 
if not, the flow proceeds to Step S7. 

25 [S5] The creation date acquiring means 6 acquires a 

date of creation of the document, and the tense acquiring means 
7 acquires a tense of the sentence describing the event in 
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question. 

[S6] Looking up the document creation date information 
and tense information thus acquired, the normalizing means 8 
performs a "DATE EXPRESSION CONVERSION PROCESS" to convert the 
date expression into corresponding numerical values. 

Details of this process will be described later with 

reference to FIG. 6. 

[S7] The normalizing means 8 determines whether or not 
a money amount expression is included in the extracted attribute 
values. If a money amount expression is Included, the flow 
proceeds to Step S8; if not. the flow proceeds to Step Sll. 

[S8] The normalizing means 8 determines whether or not 
the money amount expression in question is in a prescribed 
currency unit. If the money amount expression is in the 
prescribed currency unit, the flow proceeds to Step S10; if not, 
the flow proceeds to Step S9 . 

Where the prescribed currency unit is "yen" , for 
example, and if a money amount expression in the unit "$" exists, 
the flow proceeds to Step S9 . 

[S9] The unit converting means 9 reads out an exchange 
rate from a storage section therein , and converts the money amount 
expression into the prescribed currency unit. 

If the expression - $10 0", for example, exists and 

+~ <e -si - 130 ven' -$10 0- is converted 
if the exchange rate is SI - uw Y en * 

25 to " 1 3 0 0 0 yen" . 

tS10] The normalizing means 8 performs a "MONEY AMOUNT 
EXPRESSION CONVERSION PROCESS" to convert the money amount 



15 



20 
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expression into a numerical value. Details of this process will 
be described later with reference to FIG. 13. 

In the above example, " 1 3 0 0 0 yen" (character 
string) is converted to "13000" (numerical value). 

[Sll] The normalizing means 8 determines whether or 
not there exists some other numerical expression. If there 
exists some other numerical expression, the flow proceeds to Step 
S12; if not, the flow proceeds to Step S13. 

For example, if there exists an expression like "Number 
of shipment is fifty thousand sets", the flow proceeds to Step 
S12. 

[S12] The normalizing means 8 converts the numerical 
expressions included in the attribute values into corresponding 
numerical values. In the above example, "50000" (character 
string) is converted to a computable numerical value of "50000" . 

[S13] The correlating means 10 determines whether or 
not a proper name (e.g., "Hashimoto Electric" etc.) is included 
in the attribute values. If a proper name is included, the flow 
proceeds to Step S14; if not, the flow proceeds to Step S15. 

[SI 4] The correlating means 10 extracts the proper name, 
acquires a proper name code corresponding thereto from the 
knowledge information storing means 3, and assigns the acquired 
proper name code. 

For example, a proper name code "000 11" corresponding 
to "Hashimoto Electric" mentioned above is read out from the 
knowledge information storing means 3 and assigned. 

The knowledge information storing means 3 stores 
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information generated by correlating relevant proper names with 
one another, and accordingly, even in the case where a certain 
proper name in the document has a plurality of possibilities, 
it can be accurately specified by looking up other correlated 

proper names . 

Specifically, in the case where the proper name 
-Hashimoto Electric" has two possibilities 'Hashimoto Electric 
Corp." and "Hashimoto Electric Inc.- (companies with an 
identical name exist), the president's name, location, etc. 
described in the document, for example, are compared with 
respective correlated proper names stored in the knowledge 
information storing means 3. whereby the correct proper name can 
be acquired by narrowing down the possibilities. 

[S15] The correlating means 10 determines whether or 
not there exists a reference expression (expression like 'the 
company" or 'both of then.-). If such a reference expression 
exists, the flow proceeds to Step S16; if not, the flow proceeds 
to Step S18. 

For example, if "the company", which is a reference 
expression, exists, the flow proceeds to Step S16. 

[S16] The correlating means 10 identifies a target 
which the reference expression refers to. 

in the case of 'Hashimoto Electric. President Nakayama 
announced, Hashimoto Computer, the same president, starts-- is 
identified as the target which the reference expression "the same 

president" refers to. 

As a method for such identification . when the reference 
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expression, such as, "the company", "the same president" is 
detected, the corresponding proper name preceding the expression 
may be identified as the target which the reference expression 
refers to. 

5 [S17] The correlating means 10 acquires a proper name 

code corresponding to the target which the reference expression 
refers to, and assigns the acquired proper name code to the 
reference expression . 

In the above example, a proper code "0001" for 

10 "President Nakayama" is assigned to the reference expression 
"the same president". 

[S18] The correlating means 10 stores the normalized 
attribute values (hereinafter referred to as normalized 
information) and the original document (or information 

15 specifying a storage location of the original document) in the 
document storing means 11 in a manner associated with each other. 

The above process makes it possible to specify an event 
described in an input document and to acquire attribute veilues 
of attributes relating to the event. Normalized information 

20 which is obtained by correlating the acquired attribute values 
with entities in the real world is then stored, together with 
the original document (or information specifying the storage 
location of the original document) , in the document storing means 
11. 

25 The "DATE EXPRESSION CONVERSION PROCESS " appearing in 

Step S6 in FIG, 3 will be now described in detail, 

FIG. 6 is a flowchart illustrating in detail the "DATE 
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EXPRESSION CONVERSION PROCESS" appearing in FIG. upon start 
of the process shown In the flowchart, the following steps are 
executed. 

IS30) The creation date acquiring means 6 acquires the 
5 date of creation of the document and substitutes the acquired 
aate for %docyear. %docmcnth. and %docday. In the case of a 
newspaper article, for example, the date of issue of the article 
is acquired as the document creation date. For documents other 
than the newspaper article, the creation date is acquired by 
10 looking up the file attributes. 

tS31] The normalizing means 8 extracts a date 

expression from the attribute values. 

If the sentence in question is "On the 1st Hashimoto 
Electric releases a new PC" . for example, -the 1st" Is extracted 

15 as a date expression. 

[832] The normalizing means 8 determines whether or 

^ „ n< .4 gts of a combination of 
not the extracted date expression consists or 

a numeral and "year" or 'month" or 'day'. If the decision in 
this step is yES. the flow proceeds to Step S33; otherwise the 

20 flow proceeds to Step S34. 

I» the case of "Isf mentioned above, for example, 
accordingly, the decision in the step is VES and the f lowproceeds 
to Step S33. 

[S3 3 3 The normalizing means 8 looks up a numeral 

PIG 7) to convert the date expression, 
conversion table (see FIG. 

In the numeral conversion table shown in FIG. 7, 

rrol , ted „ith their corresponding 

numerical expressions are correlated wxtn 



25 
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normalized numerical values, and when a numerical expression 
(character string) is given, a numerical value corresponding 
thereto is returned. 

[S34] The normalizing means 8 looks up a date 
5 expression conversion table shown in FIG, 8, to convert the date 
expression to corresponding numerical values . 

In the date expression conversion table shown in FIG. 
8, expressions are correlated with their respective types and 
corresponding normalized numerical values. The type is a 
10 pattern of expression; for example, "date" indicates a specific 
day and "daterange" indicates a specific term. If a document 
created in the year 1998 includes the expression "March 4 last 
year", (%docyear - 1) « (1998 - 1) ■ 1997 is substituted for %year, 
and "3" and "4" are substituted for %month and %day, respectively. 
15 Also, if a document created in 19 97 Includes the 

expression "spring of 1998", "1998" is substituted for %year, 
and therefore, a normalized value of "from 1998-3-1 to 1998- 
5-30" can be obtained. 

The date expression conversion table is shown by way 
20 of example only, and it may take various other forms than the 
illustrated one. 

[S35] The normalizing means 8 determines whether or 
not all values have been acquired. If it is judged that all 
values have been acquired, the flow proceeds to Step S37; 
25 otherwise the flow proceeds to Step S36. 

For example, if all values corresponding to year, month 
and day have been acquired, the flow proceeds to Step S37. 
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[S36] The normalizing means 8 performs a date 
estimation process. Details of this process will be described 
later with reference to FIG. 9. 

[S37] The normalizing means 8 substitutes the 
normalized numerical values for %year, %month and %day, 
whereupon the process is ended. 

The above process makes it possible to convert date 
expressions Included in documents to corresponding numerical 
values • 

Referring now to FIG. 9, the "DATE ESTIMATION PROCESS " 
appearing in Step S36 in FIG . 6 will be described in detail . Upon 
start of the process shown in the flowchart of FIG. 9. the 
following steps are executed. 

[S50] The normalizing means 8 determines whether or 
not %year alone remains unsubstituted. If %year alone is 
unsubstituted. the flow proceeds to Step S51; otherwise the flow 

proceeds to Step S52. 

[S51] The normalizing means 8 performs a %year 
estimation process. Details of this process will be described 
later with reference to FIG. 10. 

[S52] The normalizing means 8 determines whether or 
not the values other than %day remain unsubstituted. If such 
values are unsubstituted, the flow proceeds to Step S53; 
otherwise the flow proceeds to Step S55. 

[S53] The normalizing means 8 performs a %month 
estimation process. Details of this process will be described 
later with reference to FIG. 11. 
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[S54] The normalizing means 8 performs the %year 
estimation process. 

[S55] The normalizing means 8 determines whether or 
not the values other than %month remain unsubstituted. If such 
5 values are unsubstituted, the flow proceeds to Step S56; 
otherwise the flow proceeds to Step S58. 

[S56] The normalizing means 8 performs a %day 
estimation process. Details of this process will be described 
later with reference to FIG. 12. 
10 [S57] The normalizing means 8 performs the %year 

estimation process. 

[S58] The normalizing means 8 determines whether or 
not the values other than %year remain unsubstituted. If such 
values are unsubstituted, the flow proceeds to Step S59; 
15 otherwise the process is ended- 

[S59] The normalizing means 8 sets "from %year-l-l 
to %year- 12-31" as an estimated date. Namely, in cases where 
the values other than %y ear remain unsubstituted, a normalized 
value is set such that the broadest possible range is covered, 
20 thereby to prevent search omission from occurring . 

Referring now to FIG. 10, the "%year ESTIMATION 
PROCESS" appearing in Steps S51, S54 and S57 in FIG. 8 will be 
described in detail. Upon start of the process shown in the 
flowchart, the following steps are executed. 
25 [S60] The normalizing means 8 determines whether or 

not the tense acquired from the target sentence by the tense 
acquiring means 7 is the future. If the acquired tense is the 
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future, the flow proceeds to Step S61; if not, the flow proceeds 
to Step S65- 

[S61] The normalizing means 8 determines whether or 
not %docmonth is greater than %month. If it is judged that the 
former is greater than the latter, the flow proceeds to Step S62; 
if the decision is otherwise, the flow proceeds to Step S63. 

[S62] The normalizing means 8 substitutes the value 

(%docyear + 1) for %year. 

For example, where the month in which the document was 
created is April and if the sentence includes the expression 
is expected _ on March" . 'March" is estimated to he "March next 
year", and therefore. (%docyear ♦ 1) is substituted for %year. 

[S63] The normalizing means 8 determines whether or 
not %docmonth shows a value smaller than or equal to %month. If 
the decision in this step is YES. the flow proceeds to Step S64; 
if the decision is otherwise, the flow proceeds to Step S65. 

[S64] The normalizing means 8 substitutes the value 

of %docyear for %year. 

t S653 The normalizing means 8 determines whether or 
not the tense acquired by the tense acquiring means 7 is the past . 
If the decision in this step is YES. the flow proceeds to Step 
S66; if not. the flow resumes (returns to) the process of FIG. 

[S66] The normalizing means 8 determines whether or 
not %docmonth shows a value greater than or equal to the value 
of %month. If the decision in this step is YES. the flow proceeds 
to Step S67; if not, the flow proceeds to Step S68. 
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[S67] The normalizing means 8 substitutes the value 
of %docyear fox* %year, 

[S68] The normalizing means 8 determines whether or 
not the value of %docmonth is smaller than the value of %month. 
5 If the decision in this step is YES, the flow proceeds to Step 
S69; if: not, the flow resumes the process of FIG. 9. 

[S69] The normalizing means 8 substitutes the value 
(%docyear - 1) for %year. 

For example, where the month in which the document was 
10 created is April and if the sentence includes the expression "... 
has been ~. on June, " , "June* is estimated to be "June last year" , 
and therefore, the value (%docyear - 1) is substituted for %year. 
Referring now to FIG. 11, the "Imonth ESTIMATION 
PROCESS" appearing in Step S53 in FIG. 9 will be described in 
15 detail. Upon start of the process shown in the flowchart, the 
following steps are executed. 

[S80] The normalizing means 8 determines whether or 
not the tense of the target sentence acquired by the tense 
acquiring means 7 is the future. If the acquired tense is the 
20 future, the flow proceeds to Step S81; if not, the flow proceeds 
to Step S85. 

[S81] The normalizing means 8 determines whether or 
not %docday is greater than %day. If the decision in this step 
is YES, the flow proceeds to Step S82; if not, the flow proceeds 
25 to Step S83. 

[S82] The normalizing means 8 substitutes the value 
(%docmonth + 1) for %month. 
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For example, where the day In which the document was 
created is the 2nd and if the sentence includes the expression 
-On the 4th _ will start . "4th- is estimated to be the "4th 
of the same month", and therefore, the value (%docmonth ♦ 1) is 

substituted for %month. 

IS83] The normalizing means 8 determines whether or 
not %docday shows a value smaller than or equal to %day. If the 
oecision in this step is YES. the flow proceeds to step S84 : if 
not, the flow proceeds to step S85. 

[S841 The normalizing means 8 substitutes the value 

of %docmonth for *month. 

[S8S1 The normalizing means 8 determines whether or 
not the tense acquired by the tense acquiring means 7 is the past . 
M the decision in this step is YES. the flow proceeds to Step 
886, if not. the flow resumes (returns to) the process of FIG. 

IS86) The normalizing means 8 determines whether or 
not %docday shows a value greater than or equal to the value 
of *day. If the decision in this step is YES. the flow proceeds 
to Step S87: if not. the flow proceeds to Step S88. 

[S87] The normalizing means 8 substitutes the value 

of %docmonth for %month. 

,S88] The normalizing means 8 determines whether or 
not the value of *docday is smaller than the value of *day. If 
the decision in this step is YES. the flow proceeds to Step S89; 
if not. the flow resumes the process of FIG. 9. 

IS891 The normalizing means 8 substitutes the value 
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(%docmonth - 1) for %month. 

For example, where the day in which the document was 
created is the 4th and if the sentence includes the expression 
•On the 6th, - have been ~" , "6th" is estimated to be "the 6th 
of the previous month" f and therefore, the value (%docmonth - 
1) is substituted for %month. 

Referring now to FIG. 12, the "% day ESTIMATION PROCESS" 
appearing in Step S56 in FIG. 9 will be described in detail . Upon 
start of the process shown in the flowchart, the following steps 

are executed . 

[S100J The normalizing means 8 determines whether or 
not the value of %month equals one of 1, 3. 5, 6, 8, 10 and 12. 
If the decision in this step is YES, the flow proceeds to Step 
S101; if not, the flow proceeds to Step S102. 

[S101] The normalizing means 8 generates 
"from %year-%month-l to %year-%month-31" as date information. 

[S102] The normalizing means 8 determines whether or 
not the value of %month equals one of 4 , 6, 9 and 11. If the 
decision in this step is YES, the flow proceeds to Step S103? 
if not, the flow proceeds to Step S104. 

[S103] The normalizing means 8 generates 
"from %year-%month-l to %year~%month-30" as date information. 

[S104] The normalizing means 8 looks up the attribute 
value relating to "year" to determine whether or not the year 
5 in question is a leap year. If the year in question is a leap 
year, the flow proceeds to Step S105; if not, the flow proceeds 
to Step S106. 
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IS1051 The normalizing means 8 generates 
•from *year-*month-l to %year-%mont„-29- as date information. 

[S1061 The normalizing means 8 generates 
-from %year-%month-l to %year-»month-2 8 - as date information. 
5 According to the process described above . even in the 

case where a document includes only insufficient date 
information, date information is estimated based on the document 
creation date and the tense of the target sentence, so that date 
information Included in documents can be made full use of at the 

10 time of performing search. 

For example, even a vague expression like "spring of 
next year- can be converted (normalized, to a specific numerical 
value (e.g.. March 1. 1.9. to May 31. 1998). thus maxlng it 
possible to use such a vague expression also for search. 

Kef erring now to FIG. 13. the "MONEY AMOUNT EXPRESSION 
CONVERSION PROCESS" appearing in Step S10 in FIG. 3 will be 
ascribed in detail, upon start of the process shown in the 
flowchart, the following steps are executed. 

,81201 The normalizing means 8 looKs up a money amount 
20 expression conversion table shown in FIG. 1*. to convert the money 
amount expression to a corresponding numerical value, and 
substitutes the value obtained for a variable x. 

in the case of the expression "two hundred thousand 

. two - ls first converted to "2". "hundred- 
yen", for example, two is ■« 

100 - and "thousand" ls converted to "« 1000 . 
25 is converted to * i"" . 

hereby the value " 200000" is obtained as a result. 

IS12U The normalizing means 8 determines whether or 
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not the money amount expression ends with "or more" . If the money 
amount expression ends with "or more" , the flow proceeds to Step 
S122; if not, the flow proceeds to Step S123. 

[S122] The normalizing means 8 generates "from x to 
5 *" as a normalized expression. The symbol "*" denotes an 
arbitrary value. 

In the above example, x - 2000, and therefore, "from 
2000 to *" is generated. 

[SI 23] The normalizing means 8 determines whether or 
10 not the money amount expression ends with "or less" • If the money 
amount expression ends with "or less" , the flow proceeds to Step 
S124; if not, the flow proceeds to Step S125. 

[S124] The normalizing means 8 generates "from * to 
x" as the normalized expression, 
15 [S125] The normalizing means 8 determines whether or 

not the money amount expression ends with "range" or "level". 
If the money amount expression ends with "range" or "level", the 
flow proceeds to Step S126; if not, the flow proceeds to Step 
S128. 

20 [S126] The normalizing means 8 generates "from x to 

x" as the normalized expression. 

[S127] The normalizing means 8 converts each "0" 
included in x following "to" to "9". 

For example, in the case of the expression "100,000 
25 yen range (100000 to 199999 yen)", x = 100000; therefore, "0's" 
included in x following "to" are all converted to "9 W . providing 
"199999". Consequently, "from 100000 to 199999" is generated 
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as the normalized expression. 

(S1281 The normalizing means 8 determines whether or 
not the money amount expression Includes with -lower half of _ 
range" If the money amount expression includes with "lower half 
of . range-, the flow proceeds to Step S129: if not. the flow 

proceeds to Step S131. 

[S129] The normalizing means 8 generates "from x to 

x- as the normalized expression. 

[S130] The normalizing means 8 converts the first "0" 

included in x following "to" to "5". 

For example, in the case of the expression 'lower half 
of a hundred thousand yen range (the lower half of the 100000 

Mrw ,_x - x _ 100000; therefore, the first 0 
to 199999 yen range) . x - 

deluded in x following -f is converted to providing 
•150000- . Conseguently. -from 100000 to X50000" Is generated 
as the normalized expression. 

[S131] The normalizing means 8 determines whether or 
not the money amount expression ends with "higher half of - range- . 
« the money amount expression ends with "higher half of - range . 

on-?- if not the flow resumes the 
20 the flow proceeds to Step S132. if not, rn 

process of FIG. 3. 

[S 1321 The normalizing means 8 generates "from x to 

x - as the normalized expression. 

[S1331 The normalizing means 8 converts the first 0. 

25 included in x following 'from" to "6-. 

[S1341 The normalizing means 8 converts -0" included 

±n x following "to" to "9-. 



15 
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For example, in the case of the expression "the higher 
half of a hundred thousand yen range (the higher half of the 100000 
to 199999 yen range)", x =100000; therefore, the first "0" 
included in x following "from" is converted to "6" in Step SI 33 
5 and also "0's" included in x following "to" are all converted 
to "9". Consequently, "from 160000 to 199999" is generated as 
the normalized expression. 

The above process makes it possible to convert money 
amount expressions, for example, to corresponding numerical 
10 values and also to convert vague money amount expressions 
including "or more" or "the lower half of ~ range" , for example, 
to corresponding numerical values. 

Taking a specific example, the operation of the above 
embodiment will be explained. 
15 Let it be assumed that a document shown in FIG. 15 is 

input via the document input section 1 shown in FIG. 1. The 
example document shown in FIG. 15 relates to release of new 
products . 

After a document like the illustrated one is input via 
20 the document input section 1, the event specifying means 4 looks 
up the knowledge information stored in the knowledge information 
storing means 3, to specify the event described in the document 
(Step S2 in FIG. 3) . 

The example shown in FIG. 15 corresponds to the first 
25 item ( < company info> released <product> <date>) in "module main" 
described at lines 4 to 9 in FIG. 5. Accordingly, the event 
described in this document is judged to be "release of new 
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product" - 

in the knowledge information shown in FIG. 5, event 
definitions are described in a part between "module main" and 
-endmodule" . The attributes included in the event definitions, 
such as <company inf o> , are defined under "module" or "def " . For 
example, the attribute <company info> is defined under "module" 
at lines 15 to 19 and contains three definitions, that is. 
(business category> . Company name» , (business category 
2> & connectlve;<company name» , and «company name>). 
, The definition of business category> is described 

following "def" at line 12. and an applicable one among (maker 
of *. company of major company of developer of 

retailer of ... manufacturer of .*) is selected as the attribute 
value of the attribute business category> . Accordingly, the 
5 expression like "maker company of a PC" or "major PC company- 
is judged to be an attribute value of business category>. The 

symbol " | " represents "or". 

, „ *. n included in definition, a 
Where synonym is to be inexuaeu 

portion in which a synonym. Is to be included is interpose* between 

,„ - S - ana ■ ; " . as indicated at iine 17 . In the illustrated example. 

-connective" correspond, to tne portion in which a synonym is 

to be included and is define- in detail at line 16 . as -connective" 

. (specialising ln|which produces|the maKer of) . Accordingly. 

-Hashimoto Electric specializing in office automation- comes 

25 under the second definition «=ompany 

»an.e> fi connective , business category 2» of company information . 

Thus, in this embodiment, processing is performed in 
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a top-down fashion, enabling pattern matching suited to context. 

After the event type is specified by the above process . 
the tense acquiring means 7 acquires the sentence describing the 
event to obtain tense information thereof. In the example 
5 document shown in FIG. 15, the tense is the past ("released"), 
-past tense" is acquired as the tense Information. The tense 
acquired in this manner is attached to the normalized information 
as "aspect-past", as indicated at line 2 in FIG. 16. 

Subsequently, the attribute value extracting means 5 
10 extracts attribute values (Step S3 in FIG. 3) according to the 
specified event type. Specifically, the attribute value 
extracting means 5 extracts attribute values by performing 
pattern matching between the knowledge information shown in FIG. 
5 and the document. 
15 In the example shown in FIG. 15, "Hashimoto Electric" 

is extracted as < organization name> , "JCN compatible PC" is 
extracted as <type> in <product info about newly released 
products, and "GNW Series" is extracted as <product name). 

The normalizing means 8 then determines whether or not 
20 a date expression exists in the document (Step S4 in FIG. 3). 
If a d<ite expression exists, it is converted to a corresponding 
numerical value. 

The document shown in FIG. 15 includes the expression 
"18th"; therefore, the normalizing means 8 acquires the document 
25 creation date Information and the tense information in Step S5 
in FIG. 3, and performs the date expression conversion process 
in Step S6. 
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If the document creation date is 'October 19, 1993", 
for example, information "release date" specifying "date" as its 
type and -1993-10-18" as its value is attached to the normalized 
information, as indicated at line 6 in FIG. 16. Subsequently, 
in Step S7 in FIG. 3. the normalizing means 8 determines whether 
or not a money amount expression exists. The document shown in 
FIG. 15 includes expressions such as -one hundred seventy eignt 
thousand yen-, and therefore. Step S8 is executed to determine 
whether or not the money amount expression is in the prescribed 
currency unit. For example, if the prescribed currency unit is 
-yen" and the expression in question is -one hundred seventy eight 

thousand yen'. Step S10 is executed. 

__ -$150" is included. 

If an expression such as * ■ » 

currency unit conversion is performed according to the exchange 
rate ($1 - 130 yen) In Step S9. and then Step S10 Is executed. 

In step S10. the character string -one hundred seventy 
eight thousand yen" is converted to the value -178000". 

Subsequently, in Step 811. it is determined whether 
or not other numerical expressions exist. In the example shown 
in FIG. 15. however, the first sentence includes no other 
numerical expression than the date expression, and therefore. 

Step S13 is executed. 

in Step S13. the correlating means 10 determines 
whether or not a proper name exists. In the example of PIG. 15. 
the proper name -Hashimoto Electric" exlst S! accordingly. Step 

S14 ±s executed- 

in Step S14. the correlating means 10 acquires 
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information associated with Hashimoto Denki, from among the 
knowledge information stored in the knowledge information 
storing means 3. The acquired information is, for example, as 
follows : 

5 0001 Hashimoto Electric <company name) 

00011 Hashimoto Taro <president name) 

00012 Okayama prefecture <locat:ion> 

If there are a plurality of possibilities for 
"Hashimoto Electric" , it is determined whether or not the 
10 document includes other proper names (Hashimoto Taro, Okciyama 
prefecture) that are stored in* association with Hashimoto 
Electric, to narrow down the possibilities. 

Then, in Step S14 # a proper name code (e.g., 0001) 
obtained as a result of narrowing is attached to the normalized 
15 information (see line 4 in FIG, 16). 

The correlating means 10 then determines in Step SI 5 
whether or not a reference expression exists. In the example 
shown in FIG. 15, no reference expression exists, and thus the 
decision in this step is NO. Accordingly, in Step S18, the 
20 generated normalized information and the document (or 
information specifying the storage location of the document) are 
stored in the document storing means 11, and the process is ended. 

FIG. 17 shows another example document, and FIG. 18 
shows an example of normalized information obtained by 
25 processing the document shown in FIG. 17. 

As indicated at line 3 in FIG. 18, the event described 
in the document shown in FIG. 17 is merger information (field 
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-merger inf ormation) and the tense is the past ( aspect=past ) . 
Also, since the first sentence does not include the expression 
-announced", "sentence end express ion=predicate announce 

absent" is described at line 2. 
5 As the contents of "merging entity organization Info" 

at lines 5 to 30 . Hokkaido Ohki Lift and Higashi Hokkaido Ohki 
Lift are specified at lines 8 and 18 as merging organizations 
and <merging organization supplementary lnfo> supplementary to 
the information on these organizations is described at the 

10 remaining lines. 

At line 36 and the subsequent lines are described the 
remaining sentences except the sentence which was subjected to 
analysis . 

in this example, the reference expression "the same 
15 president- appears at line 4 in FIG. 17. Accordingly, the 
description "reference target. previous" is added, as Indicated 
at line 25 in FIG . 18. thereby specifying that the reference 
expression "the same president" refers to "AKutagawa Ryutaro 
(0251)' (element 2) at lines 15 to 17. 
20 The following explains an example of a process for 

retrieving documents by looking up the normalized Information 

generated as described above. 

FIG. 19 shows an example of an input screen displayed 
at the user Interface section 2 shown in FIG. 1. lm the 
2 5 illustrated example, documents including sales inf ormation on 
products are searched for. Namely, search is performed for 
aocuments describing <sales of products as their event : In this 



- 34 - 



BNSDOCID: <GB 2350712A_L> 



example, the name of an organization which released products is 
entered in the uppermost box "Organization name: " . In the next 
box "Product type:", a type of products is entered. Further, 
a price range of products is entered in the boxes "Price: " , and 
5 a release date range is entered in the boxes "Release date:". 
The button "Search" shown at the bottom is operated to start 
search after all items have been input. 

FIG . 20 shows an example of a query input in the screen 
shown in FIG. 19. In the illustrated example, "AAA" is input 

10 as the organization name and "PC" is input as the product type. 

Further, the range from " 1 0 0 0 0 0" yen to "300 
0 0 0" yen is specified as the price, and the range from " 1 
9 9 7 " (year) " 1 " (month) " 1 " (day) to " 1 9 9 7 * (year) " 6 " 
(month) "3 0* (day) is specified as the release date. 

15 The query thus entered from the input screen is 

assigned information indicative of attributes of the individual 
input items, and then supplied to the document extracting means 
12 via the event specifying means 4, the attribute value 
extracting means 5, and the correlating means 10. As such 

20 information to be assigned, "AAA", for example, is assigned 
a tag Organization name). Also, the price is converted into 
a tag <price type=price unit=yen value= n from 100000 to 300000 §, >, 
and the release date is converted into a tag <release date 
type=date value="from 1997-1-1 to 1997-6-30" > . 

25 The document extracting means 12 extracts, from the 

document storing means 11, documents having attribute values 
matching the tags of the query supplied from the user interface 
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matching the tags of the query supplied from the user interface 
section 2. Specifically, since the document storing means 11 
stores the normalized information together with the original 
documents, the document extracting means 12 collates the 
attribute values included in the normalized information with the 
tags of the query and extracts matching documents. 

The results of the search thus performed are output 
and displayed at a display device, not shown. 

FIG. 21 shows a screen template for displaying search 
> results . The illustrated example includes . as attribute values , 
-Organization name", "Product type", "Product name- , "Price". 
-Release date" and "Heading" to show search results. 

FIG. 22 shows an example of an actually displayed 
screen. In this example, the item at line 1 indicates that the 
5 organization "AAA" released desKtop personal computers priced 
in the 200000 to 299999 yen range on 1997/02/29 and that the 
heading of the document concerned is "Low price PC released". 

FIG. 23 shows another example of an input screen 
displayed at the user interface section 2 appearing in FIG. 
,0 in this example, search is performed for documents including 
-organization Merger Information". Specifically, documents 
describing merger of organizations as their event are searched 
for m the illustrated example, the names of organizations to 
oe merged are input in the first and second boxes "Organization 
25 name, " . In the boxes "Merger date: - , a range of merger date is 
input. The button "Search" at the bottom is operated to start 
search after all items have been entered. 
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FIG. 24 shows an example of a qiiiery input in the screen 
shown in FIG. 23. In the illustrated example, "AAA" is input 
as the organization name, and the range from " 1 9 9 7 " (year) 
"1- (month) " 1 " (day) to M 9 9 7- (year) "12" (month) "3 
5 1 * (day) is input as the range of merger date. 

With these items entered in the input screen, the 
button "Search" is operated, whereupon tags are generated in the 
same manner as described above, and iare collated with the 
normalized information stored in the document storing means 11 
10 to search for documents . 

FIG. 25 shows a screen template for displaying search 
results of the query shown in FIG. 24. The illustrated example 
includes, as attribute values, "Organization name", 
"Organization name", "New organization name", "Merger date" and 
15 "Heading* to show search results. 

FIG. 26 shows an example of an actually displayed 

screen . 

In the illustrated example, the document retrieved 
Indicates that the companies with the organization names " A 
20 A A" and "BBB" merged on "1997/04/01* into one company with 
the new organization name " C C C " , and the heading of the document 
is "AAA. BBB. merges". 

In the embodiment described above, input screens 
suited to events to be searched are prepared, and required items 
25 are input from the input screens, so that desired documents can 
be acquired. As mentioned above, documents are stored in the 
document storing means 11 in a manner associated with their 
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normalized information. Accordingly, even in the case where a 
document describes the price of a newly released personal 
computer in alphabetic characters "two hundred fifty thousand 
yen", the document can be acquired by looking up its normalized 
information by means of a query specifying the range from • 200000 - 

yen to "300000" yen. 

In the above embodiment, predetermined items are input 
from the input screen suited to the event to be searched, and 
documents matching the input items are searched for. 
Alternatively, a query may be input in the form of a sentence, 
and after the input sentence is normalized, matching documents 
may be searched for . An example of such a process for normalizing 
a query will be now described with reference to the flowchart 
of FIG. 27. Upon start of the process shown in the flowchart, 
the following steps are executed. 

[S1511 The user interface section 2 is supplied with 
a query described in the form of a sentence. 

[S152] The event specifying means 4 specif ies the type 
of an event described in the query. Specifically, the event 
specifying means 4 looKs up the information (see FIG. 5) on the 
event-expression mapping rules stored in the knowledge 
information storing means 3, to specify the type of the event 

described in the query. 

[ S153 ] The attribute value extracting means 5 extracts 
attribute values by looking up the knowledge information stored 
in the knowledge information storing means 3. 

[S154] The normalizing means 8 determines whether or 
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not a elate expression is included in the extracted attribute 
values. If a date expression is included, the flow proceeds to 
Step S155; if not, the flow proceeds to Step S157. 

[S155] The creation date acquiring means 6 acquires 
5 a date of creation of the query, and the tense acquiring means 
7 acquires a tense of the query. 

[S156] Looking up the query creation date information 
and tense information thus acquired, the normalizing means 8 
performs the "DATE EXPRESSION CONVERSION PROCESS" to convert the 
10 date expression into corresponding numerical values- This 
process was already explained in detail with reference to FIG. 
6, and therefore, description thereof is omitted here. 

[S157] The normalizing means 8 determines whether or 
not a money amount expression is included in the extracted 
15 attribute values. If a money amount expression is included, the 
flow proceeds to Step S158; if not, the flow proceeds to Step 
S161. 

[S158] The normalizing means 8 determines whether or 
not the money amount expression in question is in a prescribed 

20 currency unit. If the money amount expression is in the 
prescribed currency unit, the flow proceeds to Step S160; if not , 
the flow proceeds to Step S159. Where the prescribed currency 
unit is "yen", for example, and if a money amount expression in 
the unit m $ m exists, the flow proceeds to Step S159 . 

25 [S159] The unit converting means 9 reads out an 

exchange rate from the storage section therein, and converts the 
money amount expression into the prescribed currency unit. 
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If the expression - $ 1 0 0 " . f or example, exists and 

. . (1 - no yen" " $ 1 0 0" is converted 
if the exchange rate is §1 - IJO yen • 

to "1 3 0 0 0 yen" . 

[S160] The normalizing means 8 performs the "MONEY 
AMOUNT EXPRESSION CONVERSION PROCESS" to convert the money 
amount expression into a numerical value. This process was 
already explained in detail with reference to FIG. 13; therefore. 

description thereof is omitted here. 

in the above example. " 1 3 0 0 0 yen" (character 
string) is converted to - 13000" (numerical value). 

[S161] The normalizing means 8 determines whether or 
not there exists some other numerical expression. If there 
exists some other numerical expression, the flow proceeds to Step 
S162; if not. the flow proceeds to Step S163. 

For example , if there exists an expression like 'number 
of shipping fifty thousand sets" . the flow proceeds to Step S162. 

IS162] The normalizing means 8 converts the numerical 
expressions included in the attribute values into corresponding 
numerical values. In the above example, the character string 
- 5 0 0 0 0 " is converted to a computable numerical value of 
"50000" . 

[S163] The correlating means 10 determines whether or 
not a proper name (e.g. . "Hashimoto Electric" etc. , is included 
in the attribute values. If a proper name is Included, the flow 
proceeds to Step S164 : if not. the flow proceeds to Step S165. 

[S164) The correlating means 10 extracts the proper 
name, acquires a proper name code corresponding thereto from the 
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proper name code to the corresponding attribute value. 

For example, a proper name code " 00011" corresponding 
to "Hashimoto Electric" mentioned above is read out from the 
knowledge information storing means and assigned. 
5 The knowledge information storing means 3 stores 

inf orm<ition generated by correlating relevant proper names with 
one another, and accordingly, even in the case where a certain 
proper name has a plurality of possibilities , it can be accurcitely 
specified by looking up other correlated proper names . 
10 Specifically, in the case where the proper name 

"Hashimoto Electric" has two possibilities "Hashimoto Electric 
Corp." and "Hashimoto Electric Inc." (companies with an 
identical name exist), the president's name, location, etc. 
described in the query, for example, are compared with respective 
15 correlated proper names stored in the knowledge Information 
storing means 3, whereby the correct proper name can be acquired 
by narrowing down the possibilities . 

[S16 5] The correlating means 10 determines whether or 
not there exists a reference expression (expression like "the 
20 company" or "both of them"). If such a reference expression 
exists, the flow proceeds to Step S166; if not, the flow proceeds 
to Step S168. 

For example, if "the company", which is a reference 
expression, exists, the flow proceeds to Step S166. 
25 [S166] The correlating means 10 identifies a target 

which the reference expression refers to. 

In the case of "Hashimoto Electric, President Nakayama 
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announced. Hashimoto Computer, the same president, starts..' is 
identified as the target which the reference expression "the same 

president" refers to. 

As a method for such identification . when the reference 
expression, such as, 'the company", "the same president" is 
detected, the corresponding proper name preceding the expression 
may be identified as the target which the reference expression 
refers to . 

[S167] The correlating means 10 acquires a proper name 
code corresponding to the target which the reference expression 
refers to, and assigns the acquired proper name code to the 

reference expression- 

in the above example, a proper code "00010" for 
-President Nakayama" is assigned to the reference expression 

"the same president". 

[S168] The correlating means 10 supplies normalized 
query information generated in this manner to the document 
extracting means 12. Consequently, looking up the thus- 
generated normalized query information, the document extracting 
means 12 searches the documents stored in the document storing 
means 11. 

For example, in the case where "Hashimoto Shuzo 
released Sake Hashimoto." has been input as a query, the event 
specifying means 4 looks up the knowledge information stored in 
the knowledge information storing means 3. and judges that the 
input query describes the event "release of new product" . 

The attribute value extracting means 5 extracts 
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The attribute value extracting means 5 extracts 
"Hashimoto Shuzo* as Organization name>, "sake" as <product 
type> , and "Hashimoto" as <product name) . The correlating means 
10 acquires and assigns a proper name code corresponding to 
5 "Hashimoto Shuzo" , if any. If the proper name code for 
"Hashimoto Shuzo" is "0111", for example, a tag Organization 
name>Hashimoto Shuzo ( 0111 )</ organization name) is generated. 

The document extracting means 12 looks up the 
normalized information generated in the above manner, to extract 

10 matching documents from the document storing means 11. 
Specifically, the document extracting means 12 extracts, from 
the document storing means 11, documents which include 
"Hashimoto Syuzo" applied with organization name tag and the 
proper name code (0111), "Hashimoto" applied with product type 

15 tag, and "Hashimoto" applied with product name tag and of which 
the event is "release of new product". 

This process prevents a document including a sentence 
"Mr. Hashimoto ordered sake produced by Hashimoto Shuzo.", for 
example, from being retrieved as a result of search. Namely, 

20 since the query and the normalized information of documents are 
individually applied with tags indicative of extracted 
attributes, it is possible to prevent "Hashimoto" as <product 
name) from being confused with "Hashimoto" as <person name> . 

The following describes an example of a process of 

25 clipping documents according to the above embodiment. FIG. 29 
is a flowchart exemplifying a process of normalizing a query 
transmitted from a user when documents <\xe to be clipped. Upon 
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start of the process shown In the flowchart, the following steps 

are executed. 

[S180] The user interface section 2 is supplied with 

a query from a certain user. 

[S181] The event specifying means 4. the attribute 
value extracting means 5 and the correlating means 10 perform 
the process of Steps S151 through S167 shown in FIGS. 27 and 28. 

to normalize the query. 

[S182] The document extracting means 12 stores the 
th us-normali 2 ed query (normalized information) and Information 
specifying the user who transmitted the query, in a manner 
associated with each other. 

[S1831 The document extracting means 12 ana the 

_ u perform a "RELEVANCY 
importance calculating means 13 perform 

DETERMINATION PROCESS" to determine the degree of relevancy 
oetween the query from each user and the documents stored in the 
document storing means 11. This process will be described in 
detail below with reference to FIG. 30. 

Referring to FIG . 30. details of the "RELEVANCY 
DETERMINATION PROCESS" appearing in FIG. 29 will be described. 
„pon start of the process shown in the flowchart, the following 

steps are executed- 

[S2011 The importance calculating means 13 calculates. 

with respect to each user, the degree of relevancy between the 

normalised query and documents with normalised information. 

To calculate the degree of relevancy, a method may be 

employed in which target documents are scored in accordance with 
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how many important expressions appearing in the normalized query 
are included in the documents, for example, and documents scored 
high are judged to documents with high relevancy. 

[S202] Looking up the results of calculation by the 
5 importance calculating means 13 , the document extracting means 
12 extxracts documents with high relevancy. 

[S203] The document extracting means 12 extracts 
documents whose normalized information includes a date, money 
amount and numerical value matching those included in the 
10 normalized query. 

[S204] The document extracting means 12 transmits the 
matching documents to the user through the network 21. 

Referring now to FIG. 31 r an example of a process 
executed in the document processing system 20 when a new document 
15 is transmitted from the server 23 , for example , will be described . 

Upon start of the process shown in the flowchart, the 
following steps are executed, 

[S230] The document input section 1 receives a new 
document supplied thereto from the server 23, for example, 
20 through the network 21. 

[S231] The event specifying means 4, the attribute 
value extracting means 5 and the correlating means 10 perform 
the document normalization process. 

Specifically, the event specifying means 4, the 
25 attribute value extracting means 5 and the correlating means 10 
execute the process shown in FIGS. 3 and 4, thereby to generate 
normalized information corresponding to the input document. 
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[S232] The document extracting means 12 and the 

i pvecute the "RELEVANCY 
importance calculating means 13 execute 

DETERMINATION PROCESS" shown in FIG. 30. If. as a result, it 
is found that the generated normalized information matches a 
certain query, the newly input document is sent to the user who 

transmitted the query. 

According to the process described above, in cases 
where a new document is input, the degree of relevancy between 
the normalised information or the Input document and the 
normalized query from each user is calculated, and if the 
relevancy is found to be high, the document is sent to a 
corresponding user. It is therefore possible to accurately 
select and transmit documents that suit the user's request. 

The above-described processing functions can be 
performed by a computer. In this case, the contents of the 
functions to be accomplished by the document processing system 
are described in a program recorded in a computer-readable 
recording medium. By executing the program by a computer, it 
i. possible to perform the above-described process. The 
computer-readable recording medium Includes magnetic recording 
device, semiconductor memory and the like. 

To distribute the program to the market, the program 
may be stored in portable recording media such as CD-ROMs (Compact 
Disk Read Only Memories, or floppy disks. Alternatively, the 
program may be stored in the storage device of a computer 
connected to a network and may be transferred to other computers 
through the network. To execute the program by a computer, the 
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program stored in a hard disk unit or the like of the computer 
is loaded into the main memory and executed. 

As described above, according to the present invention, 
an event described in a document to be processed is specified, 
5 and attribute values of attributes relating to the specified 
event are extracted and correlated with entities in the real world 
to generate information, which is then looked up when performing 
document retrieval or clipping. Accordingly, documents can be 
retrieved or clipped based on accurate recognition of the 

10 individual attribute values, so that the accuracy of document 
retrieval or clipping can be enhanced. 

The foregoing is considered as illustrative only of 
the principles of the present invention. Further, since 
numerous modifications and changes will readily occur to those 

15 skilled in the art, it is not desired to limit the invention to 
the exact construction and applications shown and described, and 
accordingly, all suitable modifications and equivalents may be 
regarded as falling within the scope of the invention in the 
append€*d claims and their equivalents. 
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CLAIMS 

1. A document processing system for storing input 
documents after subjecting the documents to a predetermined 
process, and for retrieving or clipping documents matching a 
given query from the stored documents, comprising: 

knowledge information storing means for storing 
knowledge Information necessary for processing an input 
document ; 

event specifying means lor specifying e type of an 
event described in the input document by looking up tbe knowledge 
information stored in said knowledge information storing means; 

attribute value extracting means for extracting, from 
tbe document, attribute values of attributes relating to the 
event specified by said event specifying means by looking up the 
knowledge information stored in said knowledge Information 

storing means; 

correlating means for correleting the attribute 
values extracted by said attribute value extracting means with 
entities in real world by looking up the knowledge information 
stored in said knowledge Information storing means; 

document storing means for storing the attribute 
values correlated by said correlating means and the document or 
information specifying a storage location thereof In a manner 

associated with each other; and 

document extracting means for retrieving or clipping 
a target document by looking up the attribute values and the 



query 
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2. The document processing system according to claim 
1, wherein, if the attribute values include a proper name, said 
correcting means specifies an entity in the real world that is 
represented by the proper name by looking up other attribute 
5 values § and assigns the proper name predetermined information 
uniquely Identifying the specified entity, and 

said document extracting means looks up the 
predetermined information assigned by said correlating means to 
perform retrieval or clipping* 

10 3, The document processing system according to claim 

1, wherein, if the attribute values include a reference 
expression, such as "the same ..." or 'both of them", said 
correlating means specifies an attribute value which the 
reference expression refers to. 

15 4. The document processing system according to claim 

3, further comprising Importance calculating means for 
calculating a degree of importance of a target document by looking 
up a frequency of occurrence of a keyword included in the 
document , 

20 said importance calculating means equally processing 

the keyword and the reference expression whose target of 
reference is specified by said correlating means . 

5. The document processing system according to claim 
1 , further comprising normalizing means for converting a 

25 numerically convertible attribute value, among the attribute 
values , to a corresponding numerical value, thereby normalizing 
the numerically convertible attribute value, and wherein 
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said document extracting means looks up information 
normalized by said normalizing means to perform retrieval or 
clipping- 

6 . The document processing system according to claim 
5 further comprising unit converting means for converting a unit 
of the numerical value obtained hy said normalizing means into 

a predetermined unit. 

7 . The document processing system according to claim 
5. further comprising tense acquiring means for acquiring tense 
of a predetermined sentence constituting the document, and 

creation date acquiring means for acquiring a date of 
creation of the document, and wherein 

said normalizing means looks up the tense of the 
document acquired by said tense acquiring means and the creation 
d ate acquired by said creation date acquiring means . to estimate 
a definite value of an attribute value Indicating a date or a 



term, 



8 . The document processing system according to claim 
, further comprising importance calculating means for 
,0 calculating a degree of importance of a target document hy looHlng 

~« ~* a icevword included in the 
up a frequency of occurrence of a keywora 

document , 

said importance calculating means calculating the 
degree of importance taHing account of the date or term estimated 

25 by said normalizing means- 

9 The document processing system according to claim 
X. wherein said event specifying means, said attribute value 
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extracting means and said correlating mieans process the query 
in a manner similar to that in which the document is processed, 
and 

said document extracting means looks up the attribute 
5 values of the document and of the query correlated by said 
correlating means, to perform retrieval or clipping- 

10, A computer -readable recording medium recording 
a program for causing a computer to perform a process of storing 
input documents after subjecting the documents to a 
10 predetermined process and retrieving or clipping documents 
matching a given query from the stored documents , wherein the 
program causes the computer to functioh as 

knowledge information storing means for storing 
knowledge information necessary for processing an input 
15 document, 

event specifying means for specifying a type of an 
eivent described in the input document by looking up the knowledge 
information stored in the knowledge information storing means, 

attribute value extracting means for extracting, from 
20 the document, attribute values of attributes relating to the 
event specified by the event specifying means by looking up the 
knowledge information stored in the knowledge information 
storing means, 

correlating means for correlating the attribute 
25 values extracted by the attribute values extracting means with 
entities in real world by looking up the knowledge information 
stored in the knowledge information storing means , 
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document storing means for storing the attribute 
values correlated by the correlating means and the document or 
information specifying a storage location thereof in a manner 
associated with each other, and 

document extracting means for retrieving or clipping 
a target document by looking up the attribute values and the 
query - 
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